Search CORE

FigShare

Recommended from our members

Statistical and machine learning approaches to predicting protein-ligand interactions.

Author: Colwell Lucy J
Publication venue: Curr Opin Struct Biol
Publication date: 20/02/2018
Field of study

Data driven computational approaches to predicting protein-ligand binding are currently achieving unprecedented levels of accuracy on held-out test datasets. Up until now, however, this has not led to corresponding breakthroughs in our ability to design novel ligands for protein targets of interest. This review summarizes the current state of the art in this field, emphasizing the recent development of deep neural networks for predicting protein-ligand binding. We explain the major technical challenges that have caused difficulty with predicting novel ligands, including the problems of sampling noise and the challenge of using benchmark datasets that are sufficiently unbiased that they allow the model to extrapolate to new regimes

Recommended from our members

Comparative analysis of nanobody sequence and structure data.

Author: Colwell Lucy J
Mitchell Laura S
Publication venue: Proteins
Publication date: 01/07/2018
Field of study

Nanobodies are a class of antigen-binding protein derived from camelids that achieve comparable binding affinities and specificities to classical antibodies, despite comprising only a single 15 kDa variable domain. Their reduced size makes them an exciting target molecule with which we can explore the molecular code that underpins binding specificity-how is such high specificity achieved? Here, we use a novel dataset of 90 nonredundant, protein-binding nanobodies with antigen-bound crystal structures to address this question. To provide a baseline for comparison we construct an analogous set of classical antibodies, allowing us to probe how nanobodies achieve high specificity binding with a dramatically reduced sequence space. Our analysis reveals that nanobodies do not diversify their framework region to compensate for the loss of the VL domain. In addition to the previously reported increase in H3 loop length, we find that nanobodies create diversity by drawing their paratope regions from a significantly larger set of aligned sequence positions, and by exhibiting greater structural variation in their H1 and H2 loops

Charge as a Selection Criterion for Translocation through the Nuclear Pore Complex

Author: Brenner Michael P.
Colwell Lucy J.
Ribbeck Katharina
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/09/2009
Field of study

Nuclear pore complexes (NPCs) are highly selective filters that control the exchange of material between nucleus and cytoplasm. The principles that govern selective filtering by NPCs are not fully understood. Previous studies find that cellular proteins capable of fast translocation through NPCs (transport receptors) are characterized by a high proportion of hydrophobic surface regions. Our analysis finds that transport receptors and their complexes are also highly negatively charged. Moreover, NPC components that constitute the permeability barrier are positively charged. We estimate that electrostatic interactions between a transport receptor and the NPC result in an energy gain of several kBT, which would enable significantly increased translocation rates of transport receptors relative to other cellular proteins. We suggest that negative charge is an essential criterion for selective passage through the NPC.Merck Research LaboratoriesNational Science Foundation (U.S.) (Division of Mathematical Sciences)Kavli Institute for Bionano Science & Technology at Harvard UniversityNational Centers for Systems Biology (U.S.) (NIGMS grant GM068763)National Institute of General Medical Sciences (U.S.

DSpace@MIT

Harvard University - DASH

Conservation Weighting Functions Enable Covariance Analyses to Detect Functionally Important Amino Acids

Author: Brenner Michael P.
Colwell Lucy J.
Murray Andrew W.
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 07/11/2014
Field of study

The explosive growth in the number of protein sequences gives rise to the possibility of using the natural variation in sequences of homologous proteins to find residues that control different protein phenotypes. Because in many cases different phenotypes are each controlled by a group of residues, the mutations that separate one version of a phenotype from another will be correlated. Here we incorporate biological knowledge about protein phenotypes and their variability in the sequence alignment of interest into algorithms that detect correlated mutations, improving their ability to detect the residues that control those phenotypes. We demonstrate the power of this approach using simulations and recent experimental data. Applying these principles to the protein families encoded by Dscam and Protocadherin allows us to make testable predictions about the residues that dictate the specificity of molecular interactions

CiteSeerX

arXiv.org e-Print Archive

FigShare

Inferring interaction partners from protein sequences.

Author: Bitbol Anne-Florence
Colwell Lucy J
Dwyer Robert S
Wingreen Ned S
Publication venue: Proc Natl Acad Sci U S A
Publication date: 23/09/2016
Field of study

Specific protein-protein interactions are crucial in the cell, both to ensure the formation and stability of multiprotein complexes and to enable signal transduction in various pathways. Functional interactions between proteins result in coevolution between the interaction partners, causing their sequences to be correlated. Here we exploit these correlations to accurately identify, from sequence data alone, which proteins are specific interaction partners. Our general approach, which employs a pairwise maximum entropy model to infer couplings between residues, has been successfully used to predict the 3D structures of proteins from sequences. Thus inspired, we introduce an iterative algorithm to predict specific interaction partners from two protein families whose members are known to interact. We first assess the algorithm's performance on histidine kinases and response regulators from bacterial two-component signaling systems. We obtain a striking 0.93 true positive fraction on our complete dataset without any a priori knowledge of interaction partners, and we uncover the origin of this success. We then apply the algorithm to proteins from ATP-binding cassette (ABC) transporter complexes, and obtain accurate predictions in these systems as well. Finally, we present two metrics that accurately distinguish interacting protein families from noninteracting ones, using only sequence data.Human Frontier Science Program, National Institutes of Health (Grant ID: R01-GM082938), National Science Foundation (Grant ID: PHY-1305525), Marie Curie (Career Integration Grant ID: 631609), Next Generation Fellowship, Eric and Wendy Schmidt Transformative Technology FundThis is the author accepted manuscript. The final version is available from the Proceedings of the National Academy of Sciences of the United States of America via https://doi.org/10.1073/pnas.160676211

Princeton University Open Access Repository

Recommended from our members

Critiquing Protein Family Classification Models Using Sufficient Input Subsets.

Author: Belanger David
Bileschi Maxwell
Bryant Drew
Carter Brandon
Colwell Lucy J
Sanderson Theo
Smith Jamie
Publication venue: J Comput Biol
Publication date: 01/08/2020
Field of study

In many application domains, neural networks are highly accurate and have been deployed at large scale. However, users often do not have good tools for understanding how these models arrive at their predictions. This has hindered adoption in fields such as the life and medical sciences, where researchers require that models base their decisions on underlying biological phenomena rather than peculiarities of the dataset. We propose a set of methods for critiquing deep learning models and demonstrate their application for protein family classification, a task for which high-accuracy models have considerable potential impact. Our methods extend the Sufficient Input Subsets (SIS) technique, which we use to identify subsets of features in each protein sequence that are alone sufficient for classification. Our suite of tools analyzes these subsets to shed light on the decision-making criteria employed by models trained on this task. These tools show that while deep models may perform classification for biologically relevant reasons, their behavior varies considerably across the choice of network architecture and parameter initialization. While the techniques that we develop are specific to the protein sequence classification task, the approach taken generalizes to a broad set of scientific contexts in which model interpretability is essential

arXiv.org e-Print Archive

Optimal Design of Experiments by Combining Coarse and Fine Measurements.

Author: Brenner Michael P
Colwell Lucy J
Lee Alpha A
Publication venue: Phys Rev Lett
Publication date: 16/10/2017
Field of study

In many contexts, it is extremely costly to perform enough high-quality experimental measurements to accurately parametrize a predictive quantitative model. However, it is often much easier to carry out large numbers of experiments that indicate whether each sample is above or below a given threshold. Can many such categorical or "coarse" measurements be combined with a much smaller number of high-resolution or "fine" measurements to yield accurate models? Here, we demonstrate an intuitive strategy, inspired by statistical physics, wherein the coarse measurements are used to identify the salient features of the data, while the fine measurements determine the relative importance of these features. A linear model is inferred from the fine measurements, augmented by a quadratic term that captures the correlation structure of the coarse data. We illustrate our strategy by considering the problems of predicting the antimalarial potency and aqueous solubility of small organic molecules from their 2D molecular structure.. L. J. C. acknowledges a Next Generation fellowship and a Marie Curie CIG [Evo-Couplings, Grant No. 631609]. M. P. B. acknowledges support from the Simons Foundation and from the National Science Foundation through DMS-1715477

Online Research @ Cardiff

Proline provides site-specific flexibility for in vivo collagen.

Author: Bihan Dominique
Chow Wing Ying
Colwell Lucy J
Duer Melinda J
Farndale Richard W
Forman Chris J
Puszkarska Anna M
Rajan Rakesh
Reid David G
Slatter David A
Wales David J
Publication venue: Sci Rep
Publication date: 01/09/2018
Field of study

Fibrillar collagens have mechanical and biological roles, providing tissues with both tensile strength and cell binding sites which allow molecular interactions with cell-surface receptors such as integrins. A key question is: how do collagens allow tissue flexibility whilst maintaining well-defined ligand binding sites? Here we show that proline residues in collagen glycine-proline-hydroxyproline (Gly-Pro-Hyp) triplets provide local conformational flexibility, which in turn confers well-defined, low energy molecular compression-extension and bending, by employing two-dimensional 13C-13C correlation NMR spectroscopy on 13C-labelled intact ex vivo bone and in vitro osteoblast extracellular matrix. We also find that the positions of Gly-Pro-Hyp triplets are highly conserved between animal species, and are spatially clustered in the currently-accepted model of molecular ordering in collagen type I fibrils. We propose that the Gly-Pro-Hyp triplets in fibrillar collagens provide fibril "expansion joints" to maintain molecular ordering within the fibril, thereby preserving the structural integrity of ligand binding sites.BBSRC, EPSRC, Raymond and Beverly Sackler Fund for Physics of Medicine, Wellcome Trust, ER